Demand-Only Broadcast: Reducing Register File and Bypass Power in Clustered Execution Cores

نویسندگان

  • Mary D. Brown
  • Yale N. Patt
چکیده

This paper introduces a technique called Demand-Only Broadcast that reduces the power consumption of the register file and result bypass network in a clustered execution core. With this technique, an instruction’s result is only broadcast within remote clusters if it is needed by dependants in those clusters. Demand-Only Broadcast was evaluated using a performance–power simulator of a high-performance clustered processor which already employs techniques for reducing register file and instruction window power. By eliminating 59% of the register file writes and intra-cluster broadcasts, the total processor power consumption (including the hardware needed by this mechanism) is reduced by 10%, while having less than a 1% impact on IPC. DemandOnly Broadcast also results in a 10% higher IPC and 4% lower power consumption than a clustered processor with a partitioned register file.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multicycle Broadcast Bypass: Too Readily Overlooked

The bypass path, also called the forwarding path, allows processors to broadcast operands from one functional unit to another more quickly than through the register file. In modern superscalar out-of-order CPUs bypass is part of the execution pipeline stage, allowing dependant instructions to issue on subsequent cycles. In these modern machines, however, the bypass network complexity is becomin...

متن کامل

Intra-level Incomplete Bypassing: Achieving Performance and Power Efficiency

Villasenor, Eric P. M.S.E.C.E., Purdue University, December, 2007. Intra-level Incomplete Bypassing: Achieving Performance and Power Efficiency . Major Professor: Mithuna S. Thottethodi. Researchers have proposed clustered microarchitectures to capture the benefits of high performance and high energy efficiency. Typically, clustered microarchitectures offer fast local bypasses (i.e., value forw...

متن کامل

Hierarchical Clustered Register File Organization for VLIW Processors

Technology projections indicate that wire delays will become one of the biggest constraints in future microprocessor designs. To avoid long wire delays and therefore long cycle times, processor cores must be partitioned into components so that most of the communication is done locally. In this paper, we propose a novel register file organization for VLIW cores that combines clustering with a hi...

متن کامل

Compiler-assisted power optimization for clustered VLIW architectures

Clustered VLIW architectures solve the scalability problem associated with flat VLIW architectures by partitioning the register file and connecting only a subset of the functional units to a register file. However, inter-cluster communication in clustered architectures leads to increased leakage in functional components and a high number of register accesses. In this paper, we propose compiler ...

متن کامل

Exploring Energy-Performance Trade-Offs for Heterogeneous Interconnect Clustered VLIW Processors

Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making design simpler, it introduces extra overheads by way of inter-cluster communication. This communication ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004